Hierarchical multi-task learning for low-complexity models based on task synergy analysis
Annotation
The widespread adoption of wearable devices and smart home systems indicates a significant growth in potential use cases for such solutions. The abundance of devices and the need for convenient interaction with them drive the active development of approaches implementing various aspects of this interaction. Currently, speech is one of the most convenient human-machine interfaces. Advances in audio and speech signal processing and analysis technologies enable the successful solution of complex tasks, such as automatic speech recognition, speaker identification and verification, and the detection of emotions, gender, and age of the speaker. The applicability of such technologies typically requires significant computational resources, often unavailable to wearable devices and smart home systems. Addressing isolated audio/speech analysis tasks significantly limits human-machine interaction scenarios. Attempts to combine various technologies on a single device lead to increased demands on computational resources. Currently, greatest interest lies in technologies for multi-task audio/speech signal analysis with reduced computational requirements, allowing their application in wearable devices and smart home systems. This paper proposes a method for the automatic construction of hierarchical multi-task models for audio/speech signal analysis. This method determines task compatibility while maintaining overall accuracy for all tasks and significantly reducing the number of trainable parameters in the multi-task model. In the first stage, isolated recognition models are trained for each target task, and the metrics of these models are determined. The second stage involves determining the pairwise compatibility of audio/speech analysis tasks by iterating over the number of shared layers in a deep neural network. In the final stage, the final hierarchical architecture implementing the multi-task recognition model is automatically formed. It is demonstrated that, compared to baseline approaches, the developed method allows for the creation of a compact hierarchical model. Compared to a set of independent single-task models, the proposed architecture shows a 56 % reduction in the number of trainable parameters with an accuracy drop of no more than 1.9 %, whereas a classical (“flat”) multi-task architecture exhibits an accuracy reduction of 2.7 %. Applying existing multi-task model optimization approaches, LT4REC and the Lottery Ticket Hypothesis, leads to accuracy reductions of 9 % and 6.5 %, respectively. The results of this work have practical significance for the smart device industry (smartphones, wearable gadgets, smart speakers). The proposed algorithm enables the creation of efficient audio analysis systems capable of performing multiple functions simultaneously with minimal requirements for computational resources and memory when deployed on resource-constrained devices.
Keywords
Постоянный URL
Articles in current issue
- Fluorescence studies of natural photosensitizers in oncology and antimicrobial therapy
- Review of deep learning methods for imaging photoplethysmography data processing
- Effect of heat treatment on the growth and luminescence of quantum dots CsPbI3 in fluorophosphate glass
- Study of nanopipettes conductivity depending on their shape and size
- Thermal conductivity of multilayer hexagonal boron nitride nanoscrolls
- Integrated control algorithm for obstacle and singularity avoidance in a robotic manipulator
- Method of automatic generation of the informative space for identifying information security events in corporate computer networks
- Spectral-based multi-band recurrent neural networks for black-box modeling of dynamic range compressors (in English)
- Detection of network anomalies in the Internet of Things environment using modified statistical criteria and ensemble methods
- Automatic detection of software design patterns using a language model on transformer architecture (in English)
- Ego-net link prediction with GNN (in English)
- Multi-task human’s psychological profile analysis based on text data using semi-supervised learning
- Modeling and optimization of information flows in electronic document management systems under information security threats
- Series-parallel architecture for the FPGA implementation of neural networks trainable in real-time using the error backpropagation algorithm
- An approach to contextual example mining for DGA domain identification using large language models
- Analysis of the effectiveness of optimizing behavioral descriptions of hardware in logic synthesizers for FPGA
- Spheroidal models of ore deposits in the framework of gravity tomography
- Prediction of maximum stresses in the shaft–insert system using a neural network
- Estimation criterion and method for optimizing the redundancy of video images in surveillance systems
- Generating spatiotemporal network load series in multi-access edge computing tasks using open data
- Application of hybrid artificial intelligence methods to practical industrial tasks under conditions of scarce training data
- Implementation and investigation of a reservoir computer based on a hardware model of three-element spiking neuron
- Analysis of a centerless control scheme for profiles of large-sized shells in the process of their shaping
- Oblivious signature based on the theory of elliptic curve isogeny